WICENTOWSKI AND SYDES, Using Implicit Information to Identify Smoking Status in Smoke-Blind Discharge Summaries Technical Brief _ Using Implicit Information to Identify Smoking Status in Smoke-Blind Medical Discharge Summaries
نویسندگان
چکیده
A b s t r a c t As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the “smoke-blind” dataset). The performance of the Naïve Bayes classifier was compared to the performance of three human annotators on a subset of the same training dataset (n=54) and against the evaluation dataset (n=104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators.
منابع مشابه
Identifying Smoking Status From Implicit Information in Medical Discharge Summaries
Human annotators and natural language applications are able to identify smoking status from discharge summaries with high accuracy when explicit evidence regarding their smoking status is present in the summary. We explore the possibility of identifying the smoking status from discharge summaries when these smoking terms have been removed. We present results using a Näıve Bayes classifier on a ...
متن کاملUsing Implicit Information to Identify Smoking Status in Smoke- blind Medical Discharge Summaries
J Am Med Inform Assoc. 2008;15:29–31. DOI 10.1197/jamia.M2440.
متن کاملTechnical Brief: Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries
As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classif...
متن کاملEmotion Detection in Suicide Notes using Maximum Entropy Classification
An ensemble of supervised maximum entropy classifiers can accurately detect and identify sentiments expressed in suicide notes. Using lexical and syntactic features extracted from a training set of externally annotated suicide notes, we trained separate classifiers for each of fifteen pre-specified emotions. This formed part of the 2011 i2b2 NLP Shared Task, Track 2. The precision and recall of...
متن کاملGains from diversification on convex combinations: A majorization and stochastic dominance approach
By incorporating both majorization theory and stochastic dominance theory, this paper presents a general theory and a unifying framework for determining the diversification preferences of risk-averse investors and conditions under which they would unanimously judge a particular asset to be superior. In particular, we develop a theory for comparing the preferences of different convex combination...
متن کامل